WebSockets

Let's learn about the WebSocket protocol and discuss its pros and cons.

Motivation#

Most web APIs use HTTP as their underlying protocol to transfer data, and HTTP is often considered as one of the best options for executing batch tasks asynchronously. But when it comes to two-way and real-time communication such as chat, live streaming, gaming, and so on, HTTP falls short because it is a request-response protocol, where usually a server closes the connection after sending the response. We describe some HTTP-based techniques and their corresponding limitations in achieving bidirectional communication in the table below:

Technique

Description

Limitation

Short polling

Requests frequently for updates from the server after fixed short intervals. The server responds whether it has an update or not.

Sends too many unnecessary requests for updates

Long polling

Requests for updates from the server with the channel left open (based on some constraints), and the server responds when it has an update

As HTTP follows a request and response model. As a result, it uses multiple concurrent connections for sending data and receiving updates, leading to resource wastage.


HTTP streaming

HTTP streaming allows servers to stream bytes of data continuously over a single connection to the client while keeping the connection open

It suffers due to half-duplex communication

Note: The table above focuses on HTTP/1.1 because the other versions were not introduced when WebSocket was first developed.

We conclude from the above discussion that we need a different approach to achieve two-way data transfer between the server and client without waiting for clients' requests. We need an approach that has low latency and avoids TCP handshake by keeping the connection open indefinitely.

What is a WebSocket?#

WebSocket was introduced in 2011 to enable full-duplex asynchronous communication over a single TCP connection to use resources efficiently. HTTP connection restricts TCP to a one-sided communication, where the client always starts the communication due to the request-response model. In other words, the client first sends requests, then the server responds to them, which is a half-duplex communication. In contrast, WebSockets take full advantage of the TCP connection allowing clients and servers to send or receive data on demand.

Half-duplex HTTP and full-duplex WebSocket connections
Half-duplex HTTP and full-duplex WebSocket connections

WebSocket leverages the core TCP channel utilizing its full-duplex nature. Data can be sent and received simultaneously by the client and the server. Websocket is a stateful protocol that performs relatively faster than HTTP because it’s lightweight and carries the overhead of large headers with each request.

Note: The WebSocket protocol is detailed in IETF's RFC 6455, whereas its API documentation maintained by W3C is available here.

A WebSocket establishes an HTTP connection and then upgrades it to the WebSocket protocol. All the transmission happens directly on the TCP channel. The URLs for connections using WebSocket begin with ws:// and wss:// for non-TLS and TLS-based connections, respectively.

Upgrading HTTP to WebSocket connection

The illustration above depicts the conversion from HTTP to a WebSocket connection. We can see that the transmission layer connection (TCP) is the same while the application layer protocol is updated from HTTP to WebSocket.

How does it work?#

A WebSocket connection starts with an HTTP connection established through a three-way TCP handshake. Afterward, an HTTP GET request is initiated to switch the protocol to WebSocket. The connection upgrade request can be accepted or rejected by the HTTP server. If the server is compatible with the WebSocket protocol and the upgrade request is valid, the connection is upgraded to a WebSocket connection. The response contains the status code 101 (Switching Protocols) and the value for the field, Sec-WebSocket-Accept.

HTTP Upgrade headers#

The headers of the initial switching protocol request are shown below:

The Upgrade request
GET / HTTP/1.1
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: eB9AWsQe8+SDcwWRjpGSow==
sec-websocket-version: 13
The Upgrade response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: zglqWZt8l79gwpiFBgKihrobE8I=

For simplicity, we have removed some fields from the headers above.

The headers above contain the following noticeable fields:

  • Status code 101 shows that the protocol is successfully upgraded and can send WebSocket frames.

  • The Sec-WebSocket-Key is a base64-encoded 16-byte value that the server uses to verify that the upgrade request is coming from a legitimate client that understands the WebSocket protocol, and not a malformed HTTP request. This value is then encrypted using a hashing algorithm like SHA, MD5, and so on.

  • The server decrypts the value of the Sec-WebSocket-Key and generates the Sec-WebSocket-Accept field by prepending a Globally Unique Identifier (GUID) value to the client-provided Sec-WebSocket-Key. The value of Sec-WebSocket-Accept is also encrypted and sent back to the client, indicating that the server has accepted the connection upgrade.

Quiz

Question 3

How does the client know it requires a WebSocket connection in a particular scenario?

Hide Answer

The developers of the client-side application are responsible for initializing the upgrade requests. This is a design decision that has to be made before developing the frontend of the application.

3 of 3

Once the connection is established and successfully upgraded to WebSocket protocol, initial control frames are exchanged. WebSocket has two types of frames, control and data frames. Each frame is identified by a 4-bit opcode. Control frames are used to know the status of the connection and can carry a maximum of 125 bytes of payload. Although, these frames can also be packaged with application data called data frames. A data frame is identified by an opcode whose most significant bit is zero. A list of common frames and their brief descriptions are given below:

Frame

Type

Opcode

Description

Text

Data

0x1

Indicates that the information carried by the frame is plain text

Binary

Data

0x2

Indicates that the information carried by the frame is in binary format

Ping

Control

0x8

Usually sent by the server to check if the client is alive

Pong

Control

0x9

Used to acknowledge a ping frame when the connection is live

Close

Control

0xA

Both endpoints exchange close frames for terminating the connection

When closing a WebSocket connection, close frames are exchanged between endpoints. After an endpoint receives a close frame, it must not send more data. Any metadata stored to maintain the TCP connection information must also be cleaned up during the connection teardown process by the endpoints. Finally, under normal circumstances, the server raises the FIN flag, and both endpoints close the TCP connection when the close sequence is complete.

Client
Control frames are exchanged
Control frames are e...
Data frames are transmitted
Data frames are tran...
TCP connection is established 
TCP connection is es...
HTTP is switched to WebSocket
HTTP is switched to...
Closing frames are exchanged
Closing frames are e...
TCP connection is closed
TCP connection is cl...
Server
Server
Viewer does not support full SVG 1.1
Steps involved in WebSocket communication

Advantages#

WebSockets perform well for real-time applications and provide the following benefits:

  • Bidirectional communication channel

  • Both server and client can send and receive data on demand

  • Higher frequency of data exchange

  • Faster data transmission with a header size of 2—10 bytes

  • Compatibility with existing infrastructure

  • Bypasses firewall using the default ports 80 and 443

Disadvantages#

WebSockets is a relatively new concept and not as mature as HTTP-style architectures. While it's great for specific scenarios, such as multiplayer gaming, live streaming, and video conferencing, it also has some limitations.

  • Horizontal scaling is complex, because we can’t load balance and reroute requests coming from a client once the connection is upgraded to WebSocket.

  • Greatly affected by connection failures, as the connection is stateful and the request headers carry no information about the sending and receiving ends, it’s difficult to recover the lost connection.

Point to Ponder

Question

Why is it difficult to scale WebSockets horizontally?

Hide Answer

Due to the stateful nature of WebSocket, both endpoints are bound to the channel, and we cannot add more machines to reroute requests and distribute the workload among different servers.

While horizontally scaling a single WebSocket connection is difficult, distributing different WebSocket connections to different servers can be achieved by an appropriate intermediary, such as a load balancer or an API gateway.

Remote Procedure Calls (RPCs)

Data Representation and Efficient Communication in APIs